智能论文笔记

Rethinking Skip Connections in Encoder-decoder Networks for Monocular Depth Estimation

Zhitong Lai , Haichao Sun , Rui Tian , Nannan Ding , Zhiguo Wu , Yanjie Wang

分类：计算机视觉 | 人工智能

2022-08-29

跳过连接是编码器网络中的基本单元，能够改善神经网络的特征宣传。但是，大多数带有跳过连接的方法仅连接了编码器和解码器中相同分辨率的连接功能，这忽略了编码器中的信息损失，而图层的进度更深。为了利用编码器较浅层中特征的信息损失，我们提出了一个完整的跳过连接网络（FSCN），以实现单眼深度估计任务。此外，要更接近跳过连接中的功能，我们提出了一个自适应串联模块（ACM）。此外，我们对FSCN和FSCN的室内和室内数据集（即Kitti Dataste和NYU DEPTH DATASET）进行了广泛的实验。

translated by 谷歌翻译

HTML版本

Boosted Dynamic Neural Networks

Haichao Yu , Haoxiang Li , Gang Hua , Gao Huang , Humphrey Shi

分类：机器学习 | 计算机视觉

2022-11-30

Early-exiting dynamic neural networks (EDNN), as one type of dynamic neural networks, has been widely studied recently. A typical EDNN has multiple prediction heads at different layers of the network backbone. During inference, the model will exit at either the last prediction head or an intermediate prediction head where the prediction confidence is higher than a predefined threshold. To optimize the model, these prediction heads together with the network backbone are trained on every batch of training data. This brings a train-test mismatch problem that all the prediction heads are optimized on all types of data in training phase while the deeper heads will only see difficult inputs in testing phase. Treating training and testing inputs differently at the two phases will cause the mismatch between training and testing data distributions. To mitigate this problem, we formulate an EDNN as an additive model inspired by gradient boosting, and propose multiple training techniques to optimize the model effectively. We name our method BoostNet. Our experiments show it achieves the state-of-the-art performance on CIFAR100 and ImageNet datasets in both anytime and budgeted-batch prediction modes. Our code is released at https://github.com/SHI-Labs/Boosted-Dynamic-Networks.

translated by 谷歌翻译

Multi-Objective Evolutionary for Object Detection Mobile Architectures Search

Haichao Zhang , Jiashi Li , Xin Xia , Kuangrong Hao , Xuefeng Xiao

分类：计算机视觉 | 机器学习

2022-11-05

Recently, Neural architecture search has achieved great success on classification tasks for mobile devices. The backbone network for object detection is usually obtained on the image classification task. However, the architecture which is searched through the classification task is sub-optimal because of the gap between the task of image and object detection. As while work focuses on backbone network architecture search for mobile device object detection is limited, mainly because the backbone always requires expensive ImageNet pre-training. Accordingly, it is necessary to study the approach of network architecture search for mobile device object detection without expensive pre-training. In this work, we propose a mobile object detection backbone network architecture search algorithm which is a kind of evolutionary optimized method based on non-dominated sorting for NAS scenarios. It can quickly search to obtain the backbone network architecture within certain constraints. It better solves the problem of suboptimal linear combination accuracy and computational cost. The proposed approach can search the backbone networks with different depths, widths, or expansion sizes via a technique of weight mapping, making it possible to use NAS for mobile devices detection tasks a lot more efficiently. In our experiments, we verify the effectiveness of the proposed approach on YoloX-Lite, a lightweight version of the target detection framework. Under similar computational complexity, the accuracy of the backbone network architecture we search for is 2.0% mAP higher than MobileDet. Our improved backbone network can reduce the computational effort while improving the accuracy of the object detection network. To prove its effectiveness, a series of ablation studies have been carried out and the working mechanism has been analyzed in detail.

translated by 谷歌翻译

VEM$^2$L: A Plug-and-play Framework for Fusing Text and Structure Knowledge on Sparse Knowledge Graph Completion

Tao He , Tianwen Jiang , Zihao Zheng , Haichao Zhu , Jingrun Zhang , Ming Liu , Sendong Zhao , Bin Qin

分类：自然语言处理

2022-07-04

知识图完成最近已广泛研究，以通过主要建模图结构特征来完成三元组中的缺失元素，但对图形结构的稀疏性敏感。期望解决这一挑战的相关文本，例如实体名称和描述，充当知识图（kgs）的另一种表达形式（kgs）。已经提出了几种使用两个编码器的结构和文本消息的方法，但由于未能平衡它们之间的权重有限。并在推理期间保留结构和文本编码器，也遭受了沉重的参数。通过知识蒸馏的激励，我们将知识视为从输入到输出概率的映射，并在稀疏的kgs上提出了一个插件框架VEM2L，以将从文本和结构消息提取到统一的知识中融合知识。具体而言，我们将模型获取的知识分配为两个不重叠的部分：一个部分与训练三元组合的合适能力有关，可以通过激励两个编码者互相学习训练集来融合。另一个反映了未观察到的查询的概括能力。相应地，我们提出了一种新的融合策略，该策略由变量EM算法证明，以融合模型的概括能力，在此期间，我们还应用图形致密操作以进一步缓解稀疏的图形问题。通过结合这两种融合方法，我们最终提出了VEM2L框架。详细的理论证据以及定量和定性实验都证明了我们提出的框架的有效性和效率。

translated by 谷歌翻译

Distilled Dual-Encoder Model for Vision-Language Understanding

Zekun Wang , Wenhui Wang , Haichao Zhu , Ming Liu , Bing Qin , Furu Wei

分类：自然语言处理 | 计算机视觉

2021-12-16

我们提出了一种跨模型关注蒸馏框架，用于培训双编码器模型，用于了解视觉语言理解任务，例如视觉推理和视觉问题应答。双编码器模型的推理速度比Fusion-encoder模型更快，并在推理期间启用图像和文本的预算。然而，双编码器模型中使用的浅交互模块不足以处理复杂的视觉语言理解任务。为了学习图像和文本的深度互动，我们引入了跨模型注意蒸馏，它使用融合编码器模型的图像到文本和文本到图像注意力分布来指导我们的双编码器的培训模型。此外，我们表明，适用于预训练和微调阶段的跨模型注意蒸馏实现了进一步的改进。实验结果表明，蒸馏的双编码器模型可实现视觉推理，视觉征求和视觉问题的竞争性能，同时享受比Fusion-Conoder模型更快的推理速度。我们的代码和型号将在https://github.com/kugwzk/distilled -dualiCoder上公开提供。

translated by 谷歌翻译

Dropout Prediction Uncertainty Estimation Using Neuron Activation Strength

Haichao Yu , Zhe Chen , Dong Lin , Gil Shamir , Jie Han

分类：机器学习

2021-10-13

辍学通常用于量化预测不确定性，即给定输入示例上模型预测的变化。但是，在实践中使用辍学可能是昂贵的，因为它需要多次运行辍学推理。在本文中，我们研究了如何以资源有效的方式估计辍学的预测不确定性。我们证明，我们可以使用神经元激活强度来估计不同辍学设置下的辍学预测不确定性，并使用三个大型数据集（Movielens，Criteo和Emnist）进行多种任务。我们的方法提供了一种推理方法，将辍学预测不确定性视为廉价辅助任务。我们还证明，使用来自神经网络层的一个子集的激活特征足以达到不确定性估计性能几乎可以与使用所有层的激活特征相当，从而进一步降低了不确定性估计的资源。

translated by 谷歌翻译

Transforming Wikipedia into Augmented Data for Query-Focused Summarization

Haichao Zhu , Li Dong , Furu Wei , Bing Qin , Ting Liu

分类：自然语言处理

2019-11-08

现有以查询为中心的摘要数据集的大小有限，使培训数据驱动的摘要模型提出了挑战。同时，以查询为重点的摘要语料库的手动构造昂贵且耗时。在本文中，我们使用Wikipedia自动收集超过280，000个示例的大型以查询为中心的摘要数据集（名为Wikiref），这可以用作数据增强的手段。我们还开发了一个基于BERT的以查询为重点的摘要模型（Q-bert），以从文档中提取句子作为摘要。为了更好地调整包含数百万个参数的巨大模型，我们仅识别和微调一个稀疏的子网络，这对应于整个模型参数的一小部分。三个DUC基准测试的实验结果表明，在Wikiref中预先培训的模型已经达到了合理的性能。在对特定基准数据集进行了微调后，具有数据增强的模型优于强大比较系统。此外，我们提出的Q-Bert模型和子网微调都进一步改善了模型性能。该数据集可在https://aka.ms/wikiref上公开获取。

translated by 谷歌翻译

Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection

Junjie Yan , Yingfei Liu , Jianjian Sun , Fan Jia , Shuailin Li , Tiancai Wang , Xiangyu Zhang

分类：计算机视觉

2023-01-03

In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.

translated by 谷歌翻译

KoopmanLab: A PyTorch module of Koopman neural operator family for solving partial differential equations

Wei Xiong , Muyuan Ma , Pei Sun , Yang Tian

分类：机器学习

2023-01-03

Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.

translated by 谷歌翻译

Ranking Differential Privacy

Shirong Xu , Will Wei Sun , Guang Cheng

分类： (统计)机器学习 | 机器学习

2023-01-02

Rankings are widely collected in various real-life scenarios, leading to the leakage of personal information such as users' preferences on videos or news. To protect rankings, existing works mainly develop privacy protection on a single ranking within a set of ranking or pairwise comparisons of a ranking under the $\epsilon$-differential privacy. This paper proposes a novel notion called $\epsilon$-ranking differential privacy for protecting ranks. We establish the connection between the Mallows model (Mallows, 1957) and the proposed $\epsilon$-ranking differential privacy. This allows us to develop a multistage ranking algorithm to generate synthetic rankings while satisfying the developed $\epsilon$-ranking differential privacy. Theoretical results regarding the utility of synthetic rankings in the downstream tasks, including the inference attack and the personalized ranking tasks, are established. For the inference attack, we quantify how $\epsilon$ affects the estimation of the true ranking based on synthetic rankings. For the personalized ranking task, we consider varying privacy preferences among users and quantify how their privacy preferences affect the consistency in estimating the optimal ranking function. Extensive numerical experiments are carried out to verify the theoretical results and demonstrate the effectiveness of the proposed synthetic ranking algorithm.

translated by 谷歌翻译